Python setup

Heatmaps

Heatmaps

Heatmaps are a common 2-dimensional colored grid that is used for visualizing intensity or level of a variable across levels of two other variables.

It is commonly used in bioinformatics to display expression levels of genes across samples, often in conjunction with some cluster analysis to put like genes/samples next to each other for visualization purposes

Heatmaps can be drawn using all the packages we have seen in this class.

Example data

We'll use average monthly temperatures in Washington, DC in 1971-2020, obtained from the National Weather Service (link).

.footnote[This data was extracted using the R package datapasta]

Weather data: Seaborn

Weather data: matplotlib

Weather data: Plotly express

Weather data: altair

Adding clustering

Re-ordering rows and columns

In many contexts, especially bioinformatics, heatmaps are used to display similarities between units, using cluster analysis. Typically hierarchical clustering is used.

We will use a breast cancer data set, and look to see if there are individuals who have similar profiles across the variables recorded, and if that might be related to outcome.

breast cancer: creating the heatmap

In seaborn, the clustermap function takes care of the clustering for us. Note that we are scaling the rows so that they have mean 0 and variance 1, to enable a better view of the differences in patterns. We are also using the correlation metric (1 - correlation) to cluster rows and columns.

breast cancer: adding to the heatmap

We will add a column to the heatmap that color-codes the outcomes, so we can see if the clustering aligns with the outcomes.

breast cancer: adding to the heatmap

Embellishing the heatmap

Adding marginal histograms and annotation

We will use a data set of measles cases in the US from 1930 to 2000 to demonstrate how to create marginal histograms around a heatmap.

We can first read in the data and do a bit of data munging.

measles: seaborn

measles: seaborn

measles: altair

measles: altair